5  Genomics and Epigenomics Analysis

5.1 On this page

Biological insights and take-home messages are at the bottom of the page at Lesson Learnt: Section 5.5.

  • Here we investigate all the genomics and epigenomics data available for the three kidney carcinomas.
  • First, we analyse the gene Copy Number Variants (CNVs).
    • We test CNVs distribution across the three kidney carcinomas and investigate any signature associated with them.
    • We perform some exploratory analyses on samples, CNVs and clinical covariates.
  • The, we look into the somatic mutations detected across the three kidney carcinomas.
    • We test the somatic mutations distribution across the three kidney carcinomas and investigate any signature associated with them.
    • We perform some exploratory analyses on samples, somatic mutations and clinical covariates.
  • Finally, we focus on DNA methylation and epigenomics signals.
    • We perform a QC on probe intensities (corresponding to CpG islands different methylation levels), their distribution across samples and we impute eventual missing values.
    • We do some exploratory analyses on samples, probe intensities and clinical covariates.
    • We then run a formal Differential Expression analysis to identify probes that have different methylation levels across the three Kidney cancer types, and the genes associated to them.
    • We perform Gene Set Enrichment Analyses on the genes associated with differentially methylated CpG islands to investigate biological and molecular themes that discriminates between the three Kidney cancer types.

5.2 Copy Number Variants (CNVs)

5.2.1 Filtering & QC

The copy number variants are available at TCGA as information collapsed at the gene level. CNVs information are reported for 24,776 genes. For each gene, we have an integer value ranging from -2 (complete deletion of the genomic region) to +2 (complete duplication of both alleles), with 0 indicating no CNV event for that gene.

All of the 24,776 reported genes are affected by at least a CNV event across 877 kidney carcinomas samples, only for 10 samples we do not have CNV information.

Patients affected by KICH kidney carcinomas seem to have a significantly higher number of genes affected by CNVs, followed by KIRP patients. KIRC patients have the least amount of genes affected by CNVs. In KICH and KIRP patients, also, CNVs involving gene duplications are more abundant than the ones involving gene deletions.

Figure 1: CNVs distributions across kidney carcinomas.

5.2.2 CNVs signatures across Kidney cancers

Let’s now investigate the presence of any CNV signature across the three kidney carcinomas. As mentioned before, the provided TCGA CNVs data are reported at the gene level, but obviously, Whole Genome Duplication or Chromosomal Aberration events would usually be larger than a single gene and span several genes. Since we do not have access to the raw genomics data, a first naive approach would be to walk each chromosome and bridge together as single CNV event regions between consecutive genes having the same CNV score as reported by TCGA. This would allow to collapse the information of 24,776 different segments (the reported genes) into tens or hundreds of long, consecutive genomics ranges. The approach would work well under the assumption that large CNV events are much more likely to happen than short, isolated CNV events, i.e.: spanning just one or few genes. Moreover, the available CNV data are limited to the gene level (~1-2% of total human genome length), which implies that we have no direct information on CNV events (or lack thereof) over 98% of the genome. So, we have decided to do not bridge the genomic regions across consecutive genes with the same CNV score as single CNV events, and keep the downstream CNV analyses at the gene level.

Figure 2: CNVs distributions at the gene level across 877 kidney carcinomas biopsies.

As seen in Figure 1, biopsies from KICH patients show extensive CNV events that seems to span most of the coding genome. CNVs events seems to have similar patterns across the different kidney carcinomas, and they appear to span full-length chromosomes. Within each kidney carcinoma type, it seems like biopsies could be clustered in different subtypes dependeing on their CNV signatures.

In the next steps, we will try to correlates these CNV signatures to other clinical covariates.

5.2.3 Dimesionality Reduction and Dataset Exploration

5.2.3.1 UMAP on CNVs data

Let’s check how the samples cluster on a UMAP. For transcriptomics data (see Section 2.3.1) we have drawn the UMAP based on the top 1,000 most variable genes. Given the nature (and distribution) of the underlying CNV data, we have decided to draw the UMAP based on the top 1,000, 2,000, 3,000, 4,000, 5,000 and 10,000 most variable CNV-affected genes.

Increasing the number of genes to compute the UMAP improved the clustering between the samples of the different kidney carcinomas. A cluster of overlapping biopsies from the three kidney carcinomas is visible on the right hand side of N=1000, N=2000, N=3000, N=10000 and on the left-hand side of N=5000 suggesting that CNVs can poorly discriminate between KIRC, KIRP and KICH only on a subset of CNV subtypes.

Figure 3: Kidney CNVs UMAPs based on top N most variable CNV-affected genes.

5.2.3.2 Principal Component Analysis (PCA)

As we did for the other omics data, the next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 4: PCA, exploration.

The first 18 Principal Components capture more than 80% of the variance in the Kidney cancers CNVs dataset, with the first two components (PC1 and PC2) capturing a bit more than 33% of the variance.

When we project the samples in the PC1 and PC2, we can see that the PC1 separates KIRC, KICH and KIRP, which instead cluster together. As observed in the UMAP, the clustering is more noisy than what observed for the other omics analyses (transcriptomics, proteomics and micro-RNAs).

Figure 5: PCA, biplot PC1 x PC2.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resove the three cancer types. PC1 PC4 seems to best separate samples from KIRC, KIRP and KICH.

Figure 6: PCA, pairs plots.

Let’s check the Pearson correlation with other clinical covariates.

Cancer_type correlates well with PC1 and PC4. Most of the TCGA-defined molecular subtypes correlates with PC1, PC2 and PC4. Interestingly, TCGA Subtype_CNA, which should be based on he CNV data, correlates with PC3 and PC4, but not PC1. PC3 is interesting since it correlates with follow_up_tumor_status, pathologic_t and tumor_stage, suggesting a link between CNVs signatures and cancer advancement, which is a common feature across all kidney carcinomas and not limited to KIRC, KIRP or KICH.

Figure 7: PCA, correlations between clinical covariates and Principal Components.

5.3 Somatic mutations

Let’s now dig into the Somatic Mutations detected in the biopsies of the kidney carcinomas patients. We have somatic information for 711 samples (~80% of the total 887 patients in the cohort). TCGA provides information on somatic mutations as binary information for each gene: 1, if the patient has a non-silent mutation for that gene, or 0, if the patient does not have a non-silent mutation on that gene. We could retrieve info for for 40,542 genes, and 13,584 of them had a non-silent mutations in at least one of our 711 kidney carcinomas patients. The first step was then to discard from downstream analysis the 26,958 that appeared not mutated in the kidney carcinomas biopsies.

Let’s now look at the distribution of somatic mutations across the different kidney carcinomas. KICH patients had an average of 31 genes with non-mutations, against the 53 and 56 average number of genes with non-silent mutations in KIRC and KIRP patients, respectively. Chromosomes 1, 2, 3, 11, 17 nad 19 where the chromosomes with more mutated genes on average.

Figure 8: Non-silent somatic mutations distributions.

5.3.1 Mutational signatures across Kidney cancers

Let’s now investigate the presence of any non-silent somatic mutation signature across the three kidney carcinomas.

Figure 9: non-silent somatic mutations distributions across 711 kidney carcinomas biopsies.

While for KICH and KIRP patients the non-silent somatic mutations seems to be evenly distributed across the genome, KIRC patients clearly show an over-abundance of 3 preferably mutated genes, one in chromosome 2 and two in chromosome 3.

Using external_gene_name as id variables

In KICH, ~30% of the patients (21 / 66) had a non-silent mutation in TP53, followed by 9% of patients (6 / 66) having a mutation on PTEN. As expected, KIRC patients show the highest mutational burden, with ~47% of patients (170 / 365) with mutation in VHL, ~40% of patients (145 / 365) with mutation on PBRM1 and ~20% of patients (72 / 365) with non-silent somatic mutations on TTN. As well, 16% of KIRP patients (45 / 280) has non-silent somatic mutations on TTN gene.

Figure 10: top 10 genes with non-silent somatic mutations across the kidney carcinomas.

In the table below are reported all the 19,573 somatic mutations detected across the biopsies of the three kidney carcinomas.

5.3.2 Dimesionality Reduction and Dataset Exploration

5.3.2.1 UMAP on somatic mutations data

Let’s check how the samples cluster on a UMAP. As for the CNVs data (see Section 5.2.3.1), given the nature (and distribution) of the underlying non-silent mutations data, we have decided to draw the UMAP based on the top 1,000, 2,000, 3,000, 4,000, 5,000 and 10,000 most variable affected genes.

The non-silent somatic mutation data did not resulted in any interpretable cluster.

Figure 11: UMAP based on non-silent somatic mutation data of kidney carcinomas.

5.3.2.2 Principal Component Analysis (PCA)

As we did for the other omics data, the next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 12: PCA, exploration.

The PCA captures little to no variance observed in the non-silent somatic mutations, moreover, it fails to cluster the biopsies based on the kidney carcinoma type of origin, as seen with the UMAP. Few samples seems to be outliers driving most of the variance observed in PC1 (1.82% variance) and PC2 (1.37% variance).

PC4 and PC6 seems to separate KIRC samples into four distinct clusters, one of which overlapping with a cluster containing as well KIRP and KICH biopsies.

Figure 13: PCA, pairs plots.

PC4 (0.66% of observed variance) seems to correlate with some clinical covariates, such as histological_grade and subtype_mRNA, but it is difficult to attach a biological relevance to this observation since we cannot properly cluster samples on the PCAs.

Figure 14: PCA, correlations between clinical covariates and Principal Components.

5.4 Epigenomics

Now, we move forward to analyse the epigenomics data, that provides information on the methylation status of the CpG islands across the genomes of the kidney carcinomas. The methylation status is captured with Illumina 450k arrays, and for each CpG island we have two measurements: a methylated intensity (M) and an unmethylated intensity (U) that allows for calculating the propostion of methilation at each CpG island.

For the TCGA epigenomics data, for each CpG island we have available the corresponding Beta value, which is calculated as:

$

=

$

Under ideal conditions, a value of zero indicates that all copies of the CpG site in the sample were completely unmethylated (no methylated molecules were measured) and a value of one indicates that every copy of the site was methylated.

5.4.1 Filtering & QC

We have methylation information for 646 out of the 887 kidney carcinoma biopsies: 65 KICH, 311 KIRC and 270 KIRP patients, respectively. TCGA provides methylation status as beta values for 148,068 whitelisted probes. Out of them, 6,364 probes consistently have missing values across the 646 samples, so after filtering them out we obtain 141,704 probes.

Plotting the beta-values distribution for the probes of each biopsies reveals their bimodal distribution, indicating that for each sample, the probes can either be unmethylated (beta value = 0) or methylated (beta value = 1).

Figure 15: Distribution of whitelisted beta values densities across the 646 kidney carcinoma biopsies.

5.4.2 Dimesionality Reduction and Dataset Exploration

5.4.2.1 UMAP on epigenomics data

Let’s check the samples clustering on a UMAP based on epigenomics data. Methylation data manages to clusters KIRC, KICH and KIRP biopsies according to their kidney carcinoma type. While KIRC and KIRP clusters are well defined, the KICH cluster seems to include as well samples from KIRC and KIRP, suggesting that the top 1,000 most variable probes fail to fully resolve the cancer types, or that different subtypes with similar methylation patterns exists across KIRC, KIRP and KICH.

Figure 16: Kidney epigenomics UMAP.

5.4.2.2 Principal Component Analysis (PCA)

The next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 17: PCA, exploration.

The first 10 Principal Components capture less than 50% of the variance in the Kidney cancers epigenomics dataset, with the first two components (PC1 and PC2) capturing a bit more than 20% of the variance. The screeplot, however, shows a more reasonable amount of variance capture in the top Principal Components, in contrasts to what seen using non-silent somatic mutations (see Section 5.3.2.2).

When we project the samples in the PC1 and PC2, we can see that the PC1 (~13% of observed variance) creates a gradient of samples that separates KICH, KIRC and KIRP. The second component PC2 (~10% of variance), instead, does not separates the samples based on their kidney carcinoma types.

Figure 18: PCA, biplot PC1 x PC2.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resolve the three cancer types. PC1xPC4 and PC1xPC5 indeed seem to separates better the three kidney carcinoma types.

Figure 19: PCA, pairs plots.

Let’s check the loadings for the top 5 Principal Components. These indicate which probes are the more responsible to explain the position of the samples along the components, and the direction of this separation.

Looking at the top 1% most variable probes, the following 11 probes are the top loadings for the first 5 Principal Components:

  • cg03830585
  • cg07037412
  • cg07159758
  • cg08135406
  • cg05098566
  • cg07635227
  • cg00868875
  • cg07124687
  • cg01916088
  • cg03893663
  • cg00204802
  • cg01233786
  • cg00806644

Let’s now check their beta values distributions of the five top genes identified with the PCA across the cancer types:

Figure 20: PCA, loadings.

cg03830585 is associated with the Inositol 1,4,5-Trisphosphate Receptor, Type 1 ITPR1. ITPR1 is a novel target of novel target of HIF2α and protects renal cancer cells against natural killer cells by inducing autophagy. cg03830585 appears hypermethylated in KIRP, but not in KICH (cold tumor) and KIRC (hot tumor, with immuno-suppressant and immuni-evasive profile). cg00868875 is linked to the Potassium Channel Tetramerization Domain Containing 1 KCTD1, which is a prognostic marker for KIRP. Finally, cg07037412 is linked to the Calcium Voltage-Gated Channel Subunit Alpha1 H CACNA1H. Lower methylation status in KIRC and KIRP is congruent with the observation that CACNA1H was specifically overexpressed relative to normal tissue samples in renal cancer, sarcoma and gastrointestinal stromal tumors.

Let’s now check the Pearson correlation with other clinical covariates.

Cancer type strongly correlates with PC1 (~13% of observed variance) and PC5 (~3% of observed variance), which is concordant with what we have observed in the PCA pairplots: the three kidney carcinoma types are well separated in these components. PC1 also correlates well with hystological grade, selected subtype, subtype miRNA and subtype mRNA. PC2 (~10% of observed variance) and PC3 (~6% of observed variance) strongly correlates with subtype integrative, but more interestingly with subtype DNAmeth, histological grade, tumor stage, follow up tumor status and vital status. Indeed, the PCA pairplots shows that PC2 and PC3 cluster and separate very well biopsies from KICH, KIRC and KIRP patients, despite these two components not correlating with cancer type. More interestingly, subtype DNAmeth co-correlates with residual disease, more advanced cancer and worse prognosis, suggesting that DNA methylations and epigenomics plays a major role in cancer evolution and disease outcome.

Figure 21: PCA, correlations between clinical covariates and Principal Components.

5.4.3 Differential Methylation Analysis

Let’s move forward now to the differential Methylation analysis. We will take two complementary approaches:

  • Probe-Wise Differential Methylation
  • Gene / Region-Wise Differential Methylation

The Probe-Wise Differential Methylation has a higher granularity (down to the single probe) allowing for a more precise characterization of the methylation status across the kidney carcinomas, but with the shortcoming of a more complex interpretation of the results. Gene / Region-Wise Differential Methylation approach, on the other hand, summarize the methylation status at the gene or at the region level, making comparisons and interpretation easier, with the downside of losing the signal of the methylation status of each CpG island.

5.4.3.1 Probe-Wise Differential Methylation

For epigenomics, we use two arbitrary thresholds to retain genes that are significantly differentially methylated across the each comparison: an absolute logFold-Change (logFC) higher than 2, and an adjusted p-value lower than 0.05. This results in:

  • KIRC_vs_KICH, 5,672 differentially methylated probes, 3,867 hypermethylated and 1,805 hypomethylated
  • KIRP_vs_KICH, 8,024 differentially methylated probes, 6,403 hypermethylated and 1,621 hypomethylated
  • KIRC_vs_KIRP, 954 differentially methylated probes, 163 hypermethylated and 791 hypomethylated

As usual, we can visualize the logFC p-values relationships for all the probes with a volcano plot.

KICH appears to have the highest amount of hypomethylated probes, suggesting that many of their genes are expressed due to the hypomethylation of the corresponding CpG islands. On the other hand, KIRP seems to have the highest amount of hypermethylated probes of the three kidney carcinoma types, followed by KIRC.

Figure 22: Volcano plots of each contrats reporting the differentially methylated probes.

The UpSet plot shows that KICH has a strong signature of 3,193 hypomethylated probes that are instead hypermethylated in KIRC and KIRP, and 999 hypermethylated probes that are insted hypomethylated in KIRC and KIRP. This methylation pattern can be ascribed to the fact that KICH carcinomas in general maintain their epithelial differentiation status, which instead is more often lost in KIRC and KIRP carcinomas. KIRP appears as the kidney carcinomas with most hypermethylated probes when compared to KIRC and KICH.

Figure 23: UpSet reporting the genes differentially methylated probes in common across all contrasts.

Tables of differentially methylated probes.

Let’s see if any function is enriched in the genes of linked to the Differentially Methylated Probes by performing Gene Set Enrichment Analysis (GSEA). The first step is to identify the list of linked to the differentially methylated probes. Then, we will run standard GSEA on these list of differentially methylated genes.

5.4.3.1.1 GO Biological Process terms

Gene Ontology Biological Process are the larger processes or ‘biological programs’ accomplished by the concerted action of multiple molecular activities. Examples of broad BP terms are “DNA repair” or “signal transduction”. Examples of more specific terms are “cytosine biosynthetic process” or “D-glucose transmembrane transport”.

Figure 24: dotplot of enriched GO Biological Process terms.

Figure 25: Term maps reporting relationships between GO Biological Process across the different contrasts.

Figure 26: gene maps reporting relationships between GO Biological Process enriched across the different contrasts.

The enriched GO BP terms in KIRC vs. KIRP and KIRC vs. KICH suggest that KIRC has a strong enrichment in its catabolic metabolism, with enriched terms such as “carbohydrate catabolic process”, “glycogen catabolic process” and “glycogen metabolic process”. Enrichment of these terms are driven by the overabundance of Glycogen Phosphorylase B, L and M (PYGB, PYGL, PYGM), Lactate Dehydrogenase A (LDHA) and Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH) in KIRC when compared to KIRP and KICH.

KICH seems instead to retain more its differentiation as highlighted by the lower abundance of Poly(ADP-Ribose) Polymerase 1 (PARP1) when compared to KIRC and KIRP, resulting in the enrichment of GO BP terms related to “Negative regulation of telomere maintainance”.

The complete list of 227 GO BP terms enriched across the three contrasts is reported below.

5.4.3.1.2 GO Cellular Compartments terms

Gene Ontology Cellular Compartment serves to capture the cellular location where a molecular function takes place.

Figure 27: dotplot of enriched GO Cellular Compartments terms.

Figure 28: Term maps reporting relationships between GO Cellular Compartments across the different contrasts.

Figure 29: gene maps reporting relationships between GO Cellular Compartments enriched across the different contrasts.

GO CC enrichment analyses do not provides any further insights on the characterization of kidney carcinomas based on their proteomics profiles.

The complete list of 23 GO CC terms enriched across the three contrasts is reported below.

5.4.3.1.3 GO Molecular Function terms

Gene Ontology Molecular Function serves to represent molecular-level activities performed by gene products.

Figure 30: dotplot of enriched GO Molecular Functions terms.

Figure 31: Term maps reporting relationships between GO Molecular Functions across the different contrasts.

Figure 32: gene maps reporting relationships between GO Molecular Functions enriched across the different contrasts.

Enrichment of GO MF terms confirm the catabolic signature of KIRC when compared to KIRP and KICH, with enriched terms such as “glycosiltransferase activity” and “pyridoxal phosphate binding” driven by the over abundance of PYGB, PYGL and PYGM.

The complete list of 35 GO MF terms enriched across the three contrasts is reported below.

5.4.3.1.4 KEGG pathways

KEGG pathways are manually curated and capture our knowledge of the molecular interaction, reaction and relation networks.

Figure 33: dotplot of enriched KEGG pathways.

Figure 34: Term maps reporting relationships between KEGG pathways enriched across the different contrasts.

Figure 35: gene maps reporting relationships between KEGG pathways enriched across the different contrasts.

Also the enriched KEGG pathways revolved around the metabolic disregulation of KIRC (e.g.: “Glycolysis / Gluconeogenesis” and “Starch and sucrose metabolism”) when compared to KIRP and KICH, linked as well to “Necroptosis” and “HIF-1 signaling pathway”.

The complete list of 30 KEGG pathways across the three contrasts is reported below.

5.4.3.1.5 Reactome pathways

Similarly to KEGG, Reactome pathways are manually curated biological pathways.

Figure 36: dotplot of enriched Reactome pathways.

Figure 37: Term maps reporting relationships between Reactome pathways enriched across the different contrasts.

Figure 38: gene maps reporting relationships between Reactome pathways enriched across the different contrasts.

Again, enriched Reactome pathways characterized KIRC “Glycogen metabolism”, “Metabolosim of charbohydrates” and “Pyruvate metabolism” when compared to KIRP and KICH kidney carcinomas.

The complete list of 10 Reactome pathways enriched across the three contrasts is reported below.

5.4.3.1.6 WikiPathways

Lastly, also WikiPathways provides another angle on a set of manually curated biological pathways.

Figure 39: dotplot of enriched Wikipathways pathways.

Figure 27: gene maps reporting relationships between WikiPathways enriched across the different contrasts.

The enriched WikiPathways confirmed the “Aerobiuc glycolysis - augmented” in KIRC, when compared to KIRP and KIRCH, as well as a proper “Clear cell renal cell carcinoma pathwyas”, driven by the abundance of GAPDH, LDHA and LDHB.

KICH retention of differentiation was supported by the abundance of the ATM Serine/Threonine Kinase ATM and PARP1, which resulted in the enrichment of terms such as “DNA IR-double strand breaks and cellular response via ATM”.

The complete list of 53 WikiPathways pathways enriched across the three contrasts is reported below.

5.4.3.1.7 Aggregate probes into functional regions

Often, differential methylation of a single CpG is not so informative or can be hard to detect. Therefore, knowing whether several CpGs near to each other (or regions) are concordantly differentially methylated can be of greater interest.

There are several Bioconductor packages that have functions for identifying differentially methylated regions from 450k data. Some of the most popular are the dmrFind function in the charm package, which has been somewhat superseded for 450k arrays by the bumphunter function in minfi, and, the dmrcate in the DMRcate package. They are each based on different statistical methods, but we will be using dmrcate here, as it is based on limma and thus we can use the design and contrast matrix we defined earlier.

We will again start from our matrix of M-values. For this kind of analysis, this matrix has to be annotated with the chromosomal position of the CpGs and their gene annotations. Because in a first step the limma differential methylation analysis for single CpGs will be run again, we need to specify the design matrix, contrast matrix and contrast of interest.

Once we have the relevant statistics for the individual CpGs, we can then use the dmrcate function to combine them to identify differentially methylated regions. Of particular interest here is the lambda parameter; this value is the number of nucleotides that is allowed between significant CpGs before splitting them up in different regions. So a smaller lambda will result in more but smaller regions. For array data, the authors of the dmrcate package currently recommend a lambda of 1000. The main output table DMRs contains all of the regions found, along with their genomic annotations and p-values. To inspect this object and further visualization, you can best use the extractRanges function to create a GRanges object.

Differentially Methylated regions

KIRC_vs_KICH: 20346 KIRP_vs_KICH: 19243 KIRC_vs_KIRP: 20773

Just as for the single CpG analysis, it is a good idea to visually inspect the results to make sure they make sense. For this, use the DMR.plot function. By default, this plot draws the location of the DMR in the genome, the position of nearby genes, the positions of the CpG probes, the Beta value levels of each sample as a heatmap and the mean methylation levels for the various sample groups in the experiment.

5.4.3.2 Functional Regions mCSEA

An alternative approach to detect DMRs is to predefine the regions to be tested; so, as opposed to the previous approach where the regions are defined according to heuristic distance rules we can define regions based on a shared function. For this, we will used the package mCSEA which contains three types of regions for 450K and EPIC arrays: promoter regions, gene body and CpG Islands. mCSEA is based on Gene Set Enrichment analysis (GSEA), a popular methodology for functional analysis that was specifically designed to avoid some drawbacks in the field of gene expression. Briefly, CpG sites are ranked according to a metric (logFC, t-statistic, …) and an enrichment score (ES) is calculated for each region. This is done by running through the entire ranked CpG list, increasing the score when a CpG in the region is encountered and decreasing the score when the gene encountered is not in the region. A high ES indicates these probes are found high up in the ranked list. In other words, a high (N)ES value means that for the CpG sites in this region there is - on average - a shift towards a higher methylation level. This approach has been shown to be more effective to detect smaller but consistent methylation differences.

promoters

genes

85promoters, 150 genes

74 promoters 0 genes

promoters

genes

172 promoters, 50 genes

promoters

genes

cc

Figure 8: dotplot of enriched GO Biological Process terms

ss

5.4.3.2.1 mCSEA Functional Enrichments

Let’s see if any function is enriched in the genes of the Differentially Methilated Probes. Simply extract the methylated probes, compare agains the background ans see if the associated genes are diofferentialy methylated, and see the fucntions / pathways associated with them.

d

Figure 8: dotplot of enriched GO Biological Process terms

Figure 8: dotplot of enriched GO Biological Process terms

Figure 8: dotplot of enriched GO Biological Process terms

Figure 17: dotplot of enriched KEGG pathways

Figure 17: dotplot of enriched KEGG pathways

Figure 20: dotplot of enriched Reactome pathways

Figure 20: dotplot of enriched Reactome pathways

Figure 20: dotplot of enriched Wikipathways pathways

c

5.5 Lessons Learnt

[[[[PROPER DESCRIPTION OF FINDINGS AND TRANSCRIPTOMICS ENRICHMENTS]]]]

So far, we have learnt:

  • A

5.6 Session Information

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed

Matrix products: default
BLAS/LAPACK: /home/andrea/miniforge3/envs/moai/lib/libmkl_rt.so.2;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Brussels
tzcode source: system (glibc)

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] DMRcatedata_2.20.3                                 
 [2] ExperimentHub_2.10.0                               
 [3] AnnotationHub_3.10.1                               
 [4] BiocFileCache_2.10.2                               
 [5] dbplyr_2.5.0                                       
 [6] UpSetR_1.4.0                                       
 [7] umap_0.2.10.0                                      
 [8] stringr_1.5.1                                      
 [9] scales_1.3.0                                       
[10] RColorBrewer_1.1-3                                 
[11] qs_0.27.2                                          
[12] PCAtools_2.14.0                                    
[13] missMethyl_1.36.0                                  
[14] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[15] mCSEA_1.22.0                                       
[16] Homo.sapiens_1.3.1                                 
[17] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            
[18] GO.db_3.18.0                                       
[19] OrganismDbi_1.44.0                                 
[20] GenomicFeatures_1.54.4                             
[21] mCSEAdata_1.22.0                                   
[22] org.Hs.eg.db_3.18.0                                
[23] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
[24] minfi_1.48.0                                       
[25] bumphunter_1.44.0                                  
[26] locfit_1.5-9.10                                    
[27] iterators_1.0.14                                   
[28] foreach_1.5.2                                      
[29] Biostrings_2.70.3                                  
[30] XVector_0.42.0                                     
[31] SummarizedExperiment_1.32.0                        
[32] MatrixGenerics_1.14.0                              
[33] matrixStats_1.5.0                                  
[34] Gviz_1.46.1                                        
[35] gridExtra_2.3                                      
[36] GenomicRanges_1.54.1                               
[37] GenomeInfoDb_1.38.8                                
[38] forcats_1.0.0                                      
[39] EnhancedVolcano_1.20.0                             
[40] ggrepel_0.9.6                                      
[41] ggplot2_3.5.1                                      
[42] edgeR_4.0.16                                       
[43] limma_3.58.1                                       
[44] DT_0.33                                            
[45] dplyr_1.1.4                                        
[46] DOSE_3.28.2                                        
[47] DMRcate_2.16.1                                     
[48] data.table_1.16.4                                  
[49] cowplot_1.1.3                                      
[50] clusterProfiler_4.10.1                             
[51] circlize_0.4.16                                    
[52] BiocSingular_1.18.0                                
[53] BiocParallel_1.36.0                                
[54] AnnotationDbi_1.64.1                               
[55] IRanges_2.36.0                                     
[56] S4Vectors_0.40.2                                   
[57] Biobase_2.62.0                                     
[58] BiocGenerics_0.48.1                                

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             dichromat_2.0-0.1            
  [3] progress_1.2.3                nnet_7.3-20                  
  [5] HDF5Array_1.30.1              vctrs_0.6.5                  
  [7] RApiSerialize_0.1.4           digest_0.6.37                
  [9] png_0.1-8                     shape_1.4.6.1                
 [11] deldir_2.0-4                  permute_0.9-7                
 [13] magick_2.8.5                  MASS_7.3-60.0.1              
 [15] reshape_0.8.9                 reshape2_1.4.4               
 [17] httpuv_1.6.15                 qvalue_2.34.0                
 [19] withr_3.0.2                   xfun_0.50                    
 [21] ggfun_0.1.8                   survival_3.8-3               
 [23] doRNG_1.8.6.1                 memoise_2.0.1                
 [25] gson_0.1.0                    tidytree_0.4.6               
 [27] GlobalOptions_0.1.2           gtools_3.9.5                 
 [29] R.oo_1.27.0                   Formula_1.2-5                
 [31] prettyunits_1.2.0             KEGGREST_1.42.0              
 [33] promises_1.3.2                httr_1.4.7                   
 [35] restfulr_0.0.15               rhdf5filters_1.14.1          
 [37] stringfish_0.16.0             rhdf5_2.46.1                 
 [39] rstudioapi_0.17.1             generics_0.1.3               
 [41] reactome.db_1.86.2            base64enc_0.1-3              
 [43] curl_6.2.0                    zlibbioc_1.48.2              
 [45] ScaledMatrix_1.10.0           ggraph_2.2.1                 
 [47] polyclip_1.10-7               GenomeInfoDbData_1.2.11      
 [49] quadprog_1.5-8                SparseArray_1.2.4            
 [51] RBGL_1.78.0                   interactiveDisplayBase_1.40.0
 [53] xtable_1.8-4                  evaluate_1.0.3               
 [55] S4Arrays_1.2.1                preprocessCore_1.64.0        
 [57] hms_1.1.3                     irlba_2.3.5.1                
 [59] colorspace_2.1-1              filelock_1.0.3               
 [61] readxl_1.4.3                  reticulate_1.40.0            
 [63] magrittr_2.0.3                readr_2.1.5                  
 [65] later_1.4.1                   viridis_0.6.5                
 [67] ggtree_3.10.1                 lattice_0.22-6               
 [69] genefilter_1.84.0             XML_3.99-0.18                
 [71] shadowtext_0.1.4              Hmisc_5.2-2                  
 [73] pillar_1.10.1                 nlme_3.1-167                 
 [75] compiler_4.3.2                beachmat_2.18.1              
 [77] RSpectra_0.16-2               stringi_1.8.4                
 [79] GenomicAlignments_1.38.2      plyr_1.8.9                   
 [81] crayon_1.5.3                  abind_1.4-8                  
 [83] BiocIO_1.12.0                 gridGraphics_0.5-1           
 [85] graphlayouts_1.2.2            bit_4.5.0.1                  
 [87] fastmatch_1.1-6               codetools_0.2-20             
 [89] openssl_2.3.1                 crosstalk_1.2.1              
 [91] bslib_0.8.0                   biovizBase_1.50.0            
 [93] multtest_2.58.0               mime_0.12                    
 [95] splines_4.3.2                 Rcpp_1.0.14                  
 [97] sparseMatrixStats_1.14.0      HDO.db_0.99.1                
 [99] cellranger_1.1.0              interp_1.1-6                 
[101] knitr_1.49                    blob_1.2.4                   
[103] BiocVersion_3.18.1            AnnotationFilter_1.26.0      
[105] fs_1.6.5                      checkmate_2.3.2              
[107] DelayedMatrixStats_1.24.0     ggplotify_0.1.2              
[109] tibble_3.2.1                  Matrix_1.6-5                 
[111] statmod_1.5.0                 tzdb_0.4.0                   
[113] tweenr_2.0.3                  pkgconfig_2.0.3              
[115] tools_4.3.2                   cachem_1.1.0                 
[117] RSQLite_2.3.9                 viridisLite_0.4.2            
[119] DBI_1.2.3                     graphite_1.48.0              
[121] fastmap_1.2.0                 rmarkdown_2.29               
[123] Rsamtools_2.18.0              sass_0.4.9                   
[125] patchwork_1.3.0               BiocManager_1.30.25          
[127] VariantAnnotation_1.48.1      graph_1.80.0                 
[129] scrime_1.3.5                  rpart_4.1.24                 
[131] farver_2.1.2                  tidygraph_1.3.1              
[133] scatterpie_0.2.4              yaml_2.3.10                  
[135] latticeExtra_0.6-30           foreign_0.8-88               
[137] rtracklayer_1.62.0            illuminaio_0.44.0            
[139] cli_3.6.3                     purrr_1.0.2                  
[141] siggenes_1.76.0               GEOquery_2.70.0              
[143] lifecycle_1.0.4               askpass_1.2.1                
[145] backports_1.5.0               annotate_1.80.0              
[147] gtable_0.3.6                  rjson_0.2.23                 
[149] ape_5.8-1                     jsonlite_1.8.9               
[151] bitops_1.0-9                  minfiData_0.48.0             
[153] bit64_4.6.0-1                 yulab.utils_0.2.0            
[155] base64_2.0.2                  ReactomePA_1.46.0            
[157] RcppParallel_5.1.10           jquerylib_0.1.4              
[159] GOSemSim_2.28.1               dqrng_0.4.1                  
[161] R.utils_2.12.3                lazyeval_0.2.2               
[163] shiny_1.10.0                  htmltools_0.5.8.1            
[165] enrichplot_1.22.0             rappdirs_0.3.3               
[167] ensembldb_2.26.0              glue_1.8.0                   
[169] RCurl_1.98-1.16               treeio_1.26.0                
[171] mclust_6.1.1                  BSgenome_1.70.2              
[173] jpeg_0.1-10                   igraph_2.1.4                 
[175] R6_2.5.1                      tidyr_1.3.1                  
[177] labeling_0.4.3                cluster_2.1.8                
[179] rngtools_1.5.2                Rhdf5lib_1.24.2              
[181] beanplot_1.3.1                aplot_0.2.4                  
[183] bsseq_1.38.0                  DelayedArray_0.28.0          
[185] tidyselect_1.2.1              ProtGenerics_1.34.0          
[187] htmlTable_2.4.3               ggforce_0.4.2                
[189] xml2_1.3.6                    rsvd_1.0.5                   
[191] munsell_0.5.1                 nor1mix_1.3-3                
[193] htmlwidgets_1.6.4             fgsea_1.28.0                 
[195] biomaRt_2.58.2                rlang_1.1.5                  
[197] ggnewscale_0.5.0